NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FastConformation: A Standalone ML-Based Toolkit for Modeling and Analyzing Protein Conformational Ensembles at Scale

https://doi.org/10.1101/2025.05.09.653048

Galeazzi, Flavia Maria; Monteiro_da_Silva, Gabriel; Arantes, Pablo; Varghese, Iz; Shukla, Ananya; Rubenstein, Brenda M (May 2025, bioRxiv)

Abstract Deep learning approaches like AlphaFold 2 (AF2) have revolutionized structural biology by accurately predicting the ground state structures of proteins. Recently, clustering and subsampling techniques that manipulate multiple sequence alignment (MSA) inputs into AlphaFold to generate conformational ensembles of proteins have also been proposed. Although many of these techniques have been made open source, they often require integrating multiple packages and can be challenging for researchers who have a limited programming background to employ. This is especially true when researchers are interested in subsampling to produce predictions of protein conformational ensembles, which require multiple computational steps. This manuscript introduces FastConformation, a Python-based application that integrates MSA generation, structure prediction via AF2, and interactive analysis of protein conformations and their distributions, all in one place. FastConformation is accessible through a user-friendly GUI suitable for non-programmers, allowing users to iteratively refine subsampling parameters based on their analyses to achieve diverse conformational ensembles. Starting from an amino acid sequence, users can make protein conformation predictions and analyze results in just a few hours on their local machines, which is significantly faster than traditional molecular dynamics (MD) simulations. Uniquely, by leveraging the subsampling of MSAs, our tool enables the generation of alternative protein conformations. We demonstrate the utility of FastConformation on proteins including the Abl1 kinase, LAT1 transporter, and CCR5 receptor, showcasing its ability to predict and analyze the protein conformational ensembles and effects of mutations on a variety of proteins. This tool enables a wide range of high-throughput applications in protein biochemistry, drug discovery, and protein engineering.
more » « less
Free, publicly-accessible full text available May 14, 2026
Atomistic descriptor optimization using complementary Euclidean and geodesic distance information

https://doi.org/10.1080/00268976.2024.2381617

Iyer, Gopal R; Rubenstein, Brenda M (February 2025, Molecular Physics)

Descriptors are physically-inspired, symmetry-preserving schemes for representing atomistic systems that play a central role in the construction of models of potential energy surfaces. Although physical intuition can be flexibly encoded into descriptor schemes, they are generally ultimately guided only by the spatial or topological arrangement of atoms in the system. However, since interatomic potential models aim to capture the variation of the potential energy with respect to atomic configurations, it is conceivable that they would benefit from descriptor schemes that implicitly encode both structural and energetic information rather than structural information alone. Therefore, we propose a novel approach for the optimisation of descriptors based on encoding information about geodesic distances along potential energy manifolds into the hyperparameters of commonly used descriptor schemes. To accomplish this, we combine two ideas: (1) a differential-geometric approach for the fast estimation of approximate geodesic distances [Zhu et al., J. Chem. Phys. 150, 164103 (2019)]; and (2) an information-theoretic evaluation metric – information imbalance – for measuring the shared information between two distance measures [Glielmo et al. PNAS Nexus, 1, 1 (2022)]. Using three example molecules – ethanol, malonaldehyde, and aspirin – from the MD22 dataset, we first show that Euclidean (in Cartesian coordinates) and geodesic distances are inequivalent distance measures, indicating the need for updated ground-truth distance measures that go beyond the Euclidean (or, more broadly, spatial) distance. We then utilize a Bayesian optimisation framework to show that descriptors (in this case, atom-centred symmetry functions) can be optimized to maximally express a certain type of distance information, such as Euclidean or geodesic information. We also show that modifying the Bayesian optimisation algorithm to minimise a combined objective function – the sum of the descriptor↔Euclidean and descriptor↔geodesic information imbalances – can yield descriptors that not only optimally express both Euclidean and geodesic distance information simultaneously, but in fact resolve substantial disagreements between descriptors optimized to encode only one type of distance measure. We discuss the relevance of our approach to the design of more physically rich and informative descriptors that can encode useful, alternative information about molecular systems.
more » « less
Free, publicly-accessible full text available February 16, 2026
Toward improved property prediction of 2D materials using many-body quantum Monte Carlo methods

https://doi.org/10.1063/5.0220257

Wines, Daniel; Ahn, Jeonghwan; Benali, Anouar; Kent, Paul_R C; Krogel, Jaron T; Kwon, Yongkyung; Mitas, Lubos; Reboredo, Fernando A; Rubenstein, Brenda; Saritas, Kayahan; et al (September 2025, Applied Physics Reviews)

The field of 2D materials has grown dramatically in the past two decades. 2D materials can be utilized for a variety of next-generation optoelectronic, spintronic, clean energy, and quantum computing applications. These 2D structures, which are often exfoliated from layered van der Waals materials, possess highly inhomogeneous electron densities and can possess short- and long-range electron correlations. The complexities of 2D materials make them challenging to study with standard mean-field electronic structure methods such as density functional theory (DFT), which relies on approximations for the unknown exchange-correlation functional. To overcome the limitations of DFT, highly accurate many-body electronic structure approaches such as diffusion Monte Carlo (DMC) can be utilized. In the past decade, DMC has been used to calculate accurate magnetic, electronic, excitonic, and topological properties in addition to accurately capturing interlayer interactions and cohesion and adsorption energetics of 2D materials. This approach has been applied to 2D systems of wide interest, including graphene, phosphorene, MoS2, CrI3, VSe2, GaSe, GeSe, borophene, and several others. In this review article, we highlight some successful recent applications of DMC to 2D systems for improved property predictions beyond standard DFT.
more » « less
Free, publicly-accessible full text available September 1, 2026
Gaussian processes for finite size extrapolation of many-body simulations

https://doi.org/10.1039/D4FD00051J

Landinez_Borda, Edgar Josué; Berard, Kenneth O; Lopez, Annette; Rubenstein, Brenda (November 2024, Faraday Discussions)

We employ Gaussian processes to more accurately and efficiently extrapolate many-body simulations to their thermodynamic limit.
more » « less
Full Text Available
Compound Mutations in the Abl1 Kinase Cause Inhibitor Resistance by Shifting DFG Flip Mechanisms and Relative State Populations

https://doi.org/10.1101/2024.05.23.595569

Monteiro_da_Silva, Gabriel; Lam, Kyle; Dalgarno, David C; Rubenstein, Brenda M (May 2024, bioRxiv)

Abstract The intrinsic dynamics of most proteins are central to their function. Protein tyrosine kinases such as Abl1 undergo significant conformational changes that modulate their activity in response to different stimuli. These conformational changes constitute a conserved mechanism for self-regulation that dramatically impacts kinases’ affinities for inhibitors. Few studies have attempted to extensively sample the pathways and elucidate the mechanisms that underlie kinase inactivation. In large part, this is a consequence of the steep energy barriers associated with many kinase conformational changes, which present a significant obstacle for computational studies using traditional simulation methods. Seeking to bridge this knowledge gap, we present a thorough analysis of the “DFG flip” inactivation pathway in Abl1 kinase. By leveraging the power of the Weighted Ensemble methodology, which accelerates sampling without the use of biasing forces, we have comprehensively simulated DFG flip events in Abl1 and its inhibitor-resistant variants, revealing a rugged landscape punctuated by potentially druggable intermediate states. Through our strategy, we successfully simulated dozens of uncorrelated DFG flip events distributed along two principal pathways, identified the molecular mechanisms that govern them, and measured their relative probabilities. Further, we show that the compound Glu255Lys/Val Thr315Ile Abl1 variants owe their inhibitor resistance phenotype to an increase in the free energy barrier associated with completing the DFG flip. This barrier stabilizes Abl1 variants in conformations that can lead to loss of binding for Type-II inhibitors such as Imatinib or Ponatinib. Finally, we contrast our Abl1 observations with the relative state distributions and propensity for undergoing a DFG flip of evolutionarily-related protein tyrosine kinases with diverging Type-II inhibitor binding affinities. Altogether, we expect that our work will be of significant importance for protein tyrosine kinase inhibitor discovery, while also furthering our understanding of how enzymes self-regulate through highly-conserved molecular switches.
more » « less
Full Text Available
Is stochastic thermodynamics the key to understanding the energy costs of computation?

https://doi.org/10.1073/pnas.2321112121

Wolpert, David H; Korbel, Jan; Lynn, Christopher W; Tasnim, Farita; Grochow, Joshua A; Kardeş, Gülce; Aimone, James B; Balasubramanian, Vijay; De_Giuli, Eric; Doty, David; et al (November 2024, Proceedings of the National Academy of Sciences)

The relationship between the thermodynamic and computational properties of physical systems has been a major theoretical interest since at least the 19th century. It has also become of increasing practical importance over the last half-century as the energetic cost of digital devices has exploded. Importantly, real-world computers obey multiple physical constraints on how they work, which affects their thermodynamic properties. Moreover, many of these constraints apply to both naturally occurring computers, like brains or Eukaryotic cells, and digital systems. Most obviously, all such systems must finish their computation quickly, using as few degrees of freedom as possible. This means that they operate far from thermal equilibrium. Furthermore, many computers, both digital and biological, are modular, hierarchical systems with strong constraints on the connectivity among their subsystems. Yet another example is that to simplify their design, digital computers are required to be periodic processes governed by a global clock. None of these constraints were considered in 20th-century analyses of the thermodynamics of computation. The new field of stochastic thermodynamics provides formal tools for analyzing systems subject to all of these constraints. We argue here that these tools may help us understand at a far deeper level just how the fundamental thermodynamic properties of physical systems are related to the computation they perform.
more » « less
Full Text Available
High-throughput prediction of protein conformational distributions with subsampled AlphaFold2

https://doi.org/10.1038/s41467-024-46715-9

Monteiro da Silva, Gabriel; Cui, Jennifer Y.; Dalgarno, David C.; Lisi, George P.; Rubenstein, Brenda M. (March 2024, Nature Communications)

Abstract This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins’ ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
more » « less
Lowering Activation Barriers to Success in Physical Chemistry (LABSIP): A Community Project

https://doi.org/10.1021/acs.jpca.3c07015

Baiz, Carlos R; Berger, Robert F; Donald, Kelling J; de_Paula, Julio C; Fried, Stephen D; Rubenstein, Brenda; Stokes, Grace Y; Takematsu, Kana; Londergan, Casey (January 2024, The Journal of Physical Chemistry A)

Full Text Available
Stable recursive auxiliary field quantum Monte Carlo algorithm in the canonical ensemble: Applications to thermometry and the Hubbard model

https://doi.org/10.1103/PhysRevE.107.055302

Shen, Tong; Barghathi, Hatem; Yu, Jiangyong; Del Maestro, Adrian; Rubenstein, Brenda M. (May 2023, Physical Review E)

Full Text Available
Digital circuits and neural networks based on acid-base chemistry implemented by robotic fluid handling

https://doi.org/10.1038/s41467-023-36206-8

Agiza, Ahmed A.; Oakley, Kady; Rosenstein, Jacob K.; Rubenstein, Brenda M.; Kim, Eunsuk; Riedel, Marc; Reda, Sherief (January 2023, Nature Communications)

Abstract Acid-base reactions are ubiquitous, easy to prepare, and execute without sophisticated equipment. Acids and bases are also inherently complementary and naturally map to a universal representation of “0” and “1.” Here, we propose how to leverage acids, bases, and their reactions to encode binary information and perform information processing based upon the majority and negation operations. These operations form a functionally complete set that we use to implement more complex computations such as digital circuits and neural networks. We present the building blocks needed to build complete digital circuits using acids and bases for dual-rail encoding data values as complementary pairs, including a set of primitive logic functions that are widely applicable to molecular computation. We demonstrate how to implement neural network classifiers and some classes of digital circuits with acid-base reactions orchestrated by a robotic fluid handling device. We validate the neural network experimentally on a number of images with different formats, resulting in a perfect match to thein-silicoclassifier. Additionally, the simulation of our acid-base classifier matches the results of thein-silicoclassifier with approximately 99% similarity.
more » « less

« Prev Next »

Search for: All records